Unlocking the Potential of Genomic Data to Inform Typhoid Fever Control Policy: Supportive Resources for Genomic Data Generation, Analysis, and Visualization

Abstract The global response to the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) pandemic demonstrated the value of timely and open sharing of genomic data with standardized metadata to facilitate monitoring of the emergence and spread of new variants. Here, we make the case for the value of Salmonella Typhi (S. Typhi) genomic data and demonstrate the utility of freely available platforms and services that support the generation, analysis, and visualization of S. Typhi genomic data on the African continent and more broadly by introducing the Africa Centres for Disease Control and Prevention's Pathogen Genomics Initiative, SEQAFRICA, Typhi Pathogenwatch, TyphiNET, and the Global Typhoid Genomics Consortium.

Whole genome sequencing (WGS) is a powerful tool that can improve our understanding of typhoid epidemiology and the burden of antimicrobial resistance (AMR). WGS data can provide valuable information about transmission pathways that can inform targeting of interventions and has facilitated investigations of the molecular mechanisms of new resistance phenotypes, including extensively drug-resistant (XDR) typhoid in Pakistan [1]. By providing standardized information about AMR profiles in circulating bacteria, WGS has the potential to direct local treatment guidelines and policies to extend the lifespan of existing antimicrobials. Genomic data can also inform the prioritization of vaccine introduction, and phylogenetic analysis can be used to measure the impact of vaccines on bacterial populations. Despite the value of WGS for informing typhoid control, genomic surveillance is not universally implemented, and many barriers to access exist, including cost, supply chain challenges, and lack of access to bioinformatic training [2]. Here, we seek to highlight freely available resources to support the generation, analysis, and visualization of S. Typhi WGS data to inform policy.

AFRICA PATHOGEN GENOMICS INITIATIVE
In October 2020, the Africa Centres for Disease Control and Prevention (Africa CDC) launched a continent-wide pathogen genomics initiative (Africa PGI) jointly with public, philanthropic, and private sector partners with the short-term goal of accelerating severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) sequencing in Africa [3]. A longer-term ambition was to strengthen Africa's ability to use genomics for surveillance and rapid response to additional infectious disease threats, as well as facilitating of endemic disease control and elimination. Africa PGI has 4 major components:

A pan-African network of genomic laboratories and bioinformatics institutes.
This includes providing national public health institutes and national reference laboratories with sequencing platforms, data systems, and quality assurance systems as well as the establishment of a continent-wide laboratory referral network. 2. Workforce development. The initiative will support training of laboratory technicians, bioinformaticians, and public health specialists to facilitate the generation and translation of pathogen genomic data to inform public health policy decisions. 3. Creating enabling mechanisms to support sustainable genomics-based surveillance in the continent. This includes developing harmonized best practices for the collection, storage, and utilization of biological specimens, managing biorepositories, and the ethical use of genomic data. In addition, the initiative will facilitate the donation of equipment, reagents, and supplies. 4. Implementation of priority pathogen genomics use-cases. The initiative will support the implementation of priority genomics use-cases as identified by member countries and the community of experts, who will support the sharing of best practices and affordable genomic tools, as well as setting standards and priority genomics use-cases in alignment with regional and country priorities.
These activities will contribute to the creation of a functional continent-wide pathogen genomics surveillance system, skilled workforce, and tools to help the Africa CDC and national public health institutes fulfill their mandate of reducing the burden of infectious disease and proactively counter emerging and reemerging infections.
The Africa PGI continues to strengthen public health genomics and bioinformatics capacity in Africa through financial and in-kind support from the African Union, Bill & Melinda Gates Foundation, Illumina, US Centers for Disease Control and Prevention, Rockefeller Foundation, and Oxford Nanopore Technologies and is coordinated by the Africa CDC and the African Society for Laboratory Medicine. As of September 2022, a total of 31 member states have local sequencing capacity, as compared to only 7 in 2019. During this time, 43 sequencing machines were distributed to laboratories in 29 African Union member states, and supplies, reagents, and training were provided to >400 laboratories. This expanded sequencing capacity, and the sample referral network has already contributed significantly to achieving key milestones, including >117 000 SARS-CoV-2 genomes shared via the Global Initiative on Sharing All Influenza Data (GISAID) as of September 2022, a 22-fold increase from December 2020. Of these sequences, >96% were generated by national and regional sequencing laboratories within the African continent.
In spite of these early successes, several challenges remain, including (1) scaling up genomics and bioinformatics capacity to include additional countries/regions and to augment sample volume capacity; (2) implementing standardized processes and protocols; (3) improving the supply chain for sequencing reagents and consumables; (4) consolidating the sample referral network; (5) training additional genomics and bioinformatics personnel; and (6) developing continental, regional, and national policies to enable effective implementation of public health pathogen genomics, real-time data sharing, and the integration of genomics to existing disease surveillance systems.

SEQAFRICA
While the cost of sequencing is decreasing, many laboratories and public health institutes that could benefit from sequence-derived information cannot afford to generate it. In addition to the monetary cost of generating high-quality genomic DNA, preparing and analyzing sequencing libraries, generating sequence reads, and performing quality assurance and analysis, the dearth of expertise required to perform these tasks poses a problem in many settings on the African continent [4]. Pathogen sequencing is most easily justified by showcasing data from sequencing [5]. This creates a ratchet effect where those that require sequencing cannot justify the need and therefore cannot access the monetary and training resources to be able to generate or analyze their own sequence data.
Whole genome sequencing is a key tool in understanding mechanisms and transmission dynamics of AMR. Recognizing this, the Fleming Fund, which has provided support for AMR surveillance in >20 low-and middle-income countries (LMICs), has funded SEQAFRICA. The goals of SEQAFRICA are to integrate WGS into AMR surveillance on the African continent and to sustainably build capacity for generating and understanding sequence data for surveillance. The SEQAFRICA consortium (https://antimicrobialresistance.dk/ seqafrica.aspx) is comprised of regional sequencing centers in Nigeria, Tanzania, and South Africa; a national sequence center in Ghana; and a South African coronavirus disease 2019 (COVID-19) response center ( Figure 1). The consortium is coordinated by the Danish Technical University (DTU).
SEQAFRICA augments surveillance for pathogens of AMR importance. This includes the WHO priority organisms as well as species for which there may be local priorities. SEQAFRICA receives and reviews sequencing proposals from clients across the continent and provides free sequencing support for high-priority projects to inform AMR epidemiology. This modus operandi provides access to sequencing for highpriority initiatives that lack adequate resources and facilitates sequencing of small isolate collections of rarer organisms that can fill important surveillance gaps. To date, sequenced isolates have originated from public health institutes, African Fleming Fund Fellows leading surveillance in their respective countries, and other researchers from 15 different African countries. Completed genomes are placed in the public domain within 3 months of completion, so SEQAFRICA is rapidly filling critical surveillance and research gaps with bacterial genome data from Africa.
Established in 2018, SEQAFRICA is now more than threequarters of the way toward achieving its goal of sequencing 16 000 genomes. Salmonella species, including S. Typhi, have been priorities from the start, although SEQAFRICA has received few S. Typhi sequencing requests. This lack of demand likely reflects the low isolation rate of S. Typhi on the continent, which is due to poor access to blood culture, as well as initial limited awareness of the program and the benefits of S. Typhi genome sequencing. Additionally, most S. Typhi isolates are obtained as part of externally funded term-limited research and surveillance projects, which often have their own arrangements for sequencing [6]. Such initiatives generate a wealth of data, but these tend to come from a smaller range of locales than routine, nationally owned sequencing, like that conducted by the National Institute for Communicable Diseases (NICD) in South Africa [7,8]. For countries establishing surveillance, AMR and S. Typhi surveillance are mutually enhancing. For example, Nigeria has found that boosting AMR surveillance in a system that includes a genomic surveillance component is 1 way to expand the number and diversity of S. Typhi genomes [9]. For countries that do not have their own genomic AMR surveillance, SEQAFRICA represents an excellent option to generate S. Typhi genomic data and is therefore synergistic with Fleming Fund Country grant projects focused on constructing the architecture of national surveillance systems, including building clinical microbiology at sentinel and national levels.
Through SEQAFRICA, Yamba et al sequenced invasive Salmonella isolated at the University Teaching Hospital, Zambia between January 2018 and December 2019 [10]. In total, 58 of 76 (76%) Salmonella enterica isolates from 7180 blood cultures performed during the study period were S. Typhi; 46 of these were sequenced at NICD in South Africa. Genomic surveillance uncovered AMR mechanisms for >50% of the S. Typhi that were multidrug-resistant (MDR), revealing that the most common cluster was closely genetically related to isolates from the 2010 typhoid outbreak in Zambia. Sequencing of contemporary S. Typhi isolates is currently underway at SEQAFRICA.
Awareness of the value of sequencing, as well as the understanding of how to implement it, will grow with training programs, including those offered by SEQAFRICA. The consortium has run online introductory courses on WGS in AMR surveillance; the WGS workflow-isolate to analysis; SARS-CoV-2 WGS; and basic bioinformatics using the Command line. The consortium has also scheduled in-person courses for a subset of >100 online trainees in Ghana who completed online courses on WGS sequencing and advanced bioinformatics using the Command line in September/October 2022 (https://antimicrobialresistance.dk/seqafrica/seqafrica-courses. aspx). In addition to the courses, coordinated sequencing at SEQAFRICA nodes has facilitated staff working at the centers, who received in-person training at DTU at the start of the project, to gain additional experience. Specifically, public health institutes that acquired Illumina and/or Nanopore capacity during the COVID-19 pandemic [11] can now apply these tools to bacterial genome sequencing. Consequently, the potential for SEQAFRICA and complementary initiatives, such as the Africa Pathogen Genomics Initiative, to support S. Typhi surveillance is substantial.

TYPHI PATHOGENWATCH
Typhi Pathogenwatch [12] is a web application developed to support S. Typhi surveillance. The interface supports genome analytics, such as genotyping, detecting AMR determinants and plasmid replicons, and contextualization with genomic data. Typhi Pathogenwatch uses genome assemblies to perform 3 essential tasks for surveillance and epidemiological investigations: (1) placing isolates into lineages or clonal groups based on their genetic distance; (2) identifying their closest relatives and linking to their geographic distribution; and (3) detecting the presence of genes and mutations associated with AMR. In addition, Typhi Pathogenwatch provides compatibility with typing information for multilocus sequence typing (MLST), core genome MLST, in silico serotyping, the GenoTyphi genotyping scheme, and plasmid replicon sequences (PlasmidFinder).
The application can be accessed at https://pathogen.watch/ styphi, where users can create an account to upload and analyze their genomes. User data remain private and stored in their personal account. Public genomes (N = 12 014 at the time of writing) available in Pathogenwatch with linked metadata are curated by the Global Typhoid Genomics Consortium. Metadata include, when available, country, date and source of isolation, travel information, patient age, and purpose of sampling. Users can browse and create custom collections of private and/or public genomes available in the application (https://pathogen.watch/genomes/all?organismId = 90370) via a set of filters including country, date, MLST, genotype, and AMR ( Figure 2).
The results for a single genome are displayed in a genome report that can be downloaded as a PDF. The results for a collection of genomes can be viewed online ( Figure 3) and downloaded as trees and tables of genotypes, AMR predictions, assembly metrics, and typing information. Results can be accessed at a later date and shared via a collection identification number embedded in a unique weblink, thus facilitating international collaboration. An example of the output is shown in  Typhi Pathogenwatch combines accurate genomic predictions of AMR with broad geographic and population context within an easy-to-use interface accessible to users of all bioinformatics skills levels. Global genotype and AMR data generated by Typhi Pathogenwatch are utilized and aggregated by the TyphiNET dashboard. This approach allows the rapid and incremental addition of new data and can be used to underpin surveillance of typhoid and public health decision making at the local, national, and international scales.

TYPHINET: AN ONLINE AMR SURVEILLANCE DASHBOARD FOR GLOBAL GENOMIC SURVEILLANCE OF S. TYPHI
TyphiNET is a recently developed online resource that aims to provide easy access to genome-derived data on the global distribution of S. Typhi genotypes and AMR determinants (http://typhi.net) [13]. TyphiNET empowers users to explore global trends in genome-derived metrics of public health utility, including AMR and genotype frequencies summarized down to national annual prevalence levels, without specialist computing technologies or bioinformatics expertise. Genotype and AMR data are imported directly from Typhi Pathogenwatch and filtered to include only genome collections representing nontargeted sampling, suitable for estimating national annual prevalence data. Input data can be further filtered to include only specified time periods, or to exclude data derived from returning travelers. Users can generate current reports and data visualizations of typhoid populations at a global and/or country level via any web browser.
Global patterns of national genotype and AMR frequencies can be visualized on a world map (Figure 4), with countries colored to indicate prevalence ranges (estimated from genome data) for clinically relevant AMR phenotypes. Categories of AMR that can be visualized in this way are MDR (resistant to the classical first-line drugs chloramphenicol, co-trimoxazole, and ampicillin), azithromycin resistant, ciprofloxacin nonsusceptible, ciprofloxacin resistant, XDR (MDR plus ceftriaxone and ciprofloxacin resistant, ie, resistant to all oral drugs except for azithromycin), or susceptible to all relevant antimicrobials. Users can also generate maps showing the prevalence of genotype 4.3.1 (H58; commonly associated with AMR), the dominant genotype per country, and the number of genomes available for each country.
The following plots can be viewed for either global data or a selected country ( Figure 5): (1) AMR frequencies over time; (2) trends in genotype frequency over time; (3) the frequency of resistance to different drug classes among pathogen genotypes; and (4) the genes associated with AMR within pathogen genotypes. Country-level visualizations for Pakistan, using all available genomes from 2003-2020 including returning travelers, are shown in Figure 5 and illustrate the previously identified emergence and expansion of genotype 4.3.1.1.P1, responsible for the ongoing XDR outbreak in Pakistan [14,15]. TyphiNET visualizations show an increase in the proportion of S. Typhi that are resistant to classical first-line drugs, ciprofloxacin, and third-generation cephalosporins (3GC) ( Figure 5A) from around 2017, coinciding with an increase in genotype 4.3.1.1.P1 ( Figure 5B). Examination of resistance frequencies within genotypes demonstrates that 3GC resistance is associated only with genotype 4.3.1.1.P1 ( Figure 5C) and mediated by a bla CTX-M-15 gene ( Figure 5D).
TyphiNET provides easy access to aggregated genomederived resistance frequencies for clinically relevant antimicrobials utilized in controlling typhoid, making these data accessible for the first time to a broad range of users without genomics expertise. It is anticipated that these data will be of public health utility as they have the potential to inform control strategies. For example, overviews of resistance frequencies could assist in guiding empirical treatment of typhoid in LMIC settings where the disease is likely endemic but surveillance data are lacking. In high-income countries where most infections are travel-associated, individual treatment could be informed by resistance frequencies from the country or countries visited. High resistance frequencies to multiple antimicrobials may be informative for targeting other intervention strategies such as the programmatic use of typhoid conjugate vaccines in specific regions. The inclusion of data from travel-associated cases provides informal sentinel surveillance for resistance in countries in which S. Typhi surveillance data are not available [16].
TyphiNET was developed as an open-source MERN (MongoDB, Express, React, Node.js) stack JavaScript application (code available at: https://github.com/zadyson/ TyphiNET). Source, genotypes [17][18][19], and AMR data from "nontargeted" sampling frames sequences are imported regularly from Typhi Pathogenwatch [12] and curated by contributors to the Global Typhoid Genomics Consortium. Resistance determinants within genotype plot shows the distribution of specific genes and mutations mediating resistance to a selected drug class, within common pathogen genotypes. Bars are colored as per the inset legend. Buttons at the bottom of the page allow users to download a PDF report of the current plot views, as well as a line list of all genome-derived data in plain-text comma-separated variable (CSV) format. The camera button allows users to download visualizations.

GLOBAL TYPHOID GENOMICS CONSORTIUM
The Global Typhoid Genomics Consortium was established in April 2021 to provide a mechanism for the global typhoid research community to engage in the aggregation of S. Typhi genomic data, to facilitate monitoring the emergence and spread of AMR and to inform targeted public health action. The specific goals of the consortium are to (1) encourage prompt sharing of typhoid genome data for public health benefit; (2) facilitate the extraction and reporting of key data of public health relevance; and (3) promote and facilitate the dissemination and use of information derived from typhoid genomic data to monitor AMR and postvaccination impact.
The key activities of the consortium are to encourage and coordinate sharing and release of typhoid genomics data in a manner that maximizes its potential to inform public health. The consortium does not seek to generate or claim ownership of any genome data; rather, the model is to encourage data generators to deposit raw genome data into public databases and share source information using a standardized metadata template, available at https://bit.ly/typhiMeta. This model facilitates harmonization of source information across aggregated S. Typhi data, which is crucial to allow downstream integration of the data for public health benefit, while ensuring that consortium members retain full control over their data and when they choose to make it public. Typhi Pathogenwatch is used as the central analysis platform to generate inferred genotypes and AMR determinants from raw genomes, and to maintain a publicly available and searchable database of genome assemblies and an interactive global phylogeny. Genotyping is conducted using the GenoTyphi genotyping framework [17,18], whose ongoing curation will be managed by a working group of the consortium. A key field in the consortium metadata template is "purpose of sampling," which seeks to identify sets of genomes that are derived from "nontargeted" sampling frames that are suitable for estimation of national annual prevalence rates of AMR and genotypes (eg, in the TyphiNET dashboard, described above, and other reports).
Consortium membership is free and open to all (https:// www.typhoidgenomics.org/); the intention is to include all those with an academic or public health interest in using WGS to investigate, monitor, and/or understand typhoid epidemiology. Current membership (as at mid-2022) numbers >150 individuals from 39 countries. The majority of members are from countries in Africa and Asia where typhoid is endemic; however, there is also considerable participation from countries where typhoid is considered a travel-associated disease and subject to routine WGS, which provides useful "sentinel surveillance" data for common travel destinations [16]. Consortium activities are overseen by a multidisciplinary steering committee of international experts in typhoid surveillance and epidemiology, and an advisory board of stakeholders from the global public health community has been engaged to help identify ways to promote typhoid genomic surveillance-and particularly the use of pathogen WGS data-for public health benefit.

CONCLUSIONS AND NEXT STEPS
The global response to the SARS-CoV-2 pandemic has demonstrated the value of global genomic surveillance and timely data sharing in monitoring the emergence and spread of pathogenic strains, and one of its legacies will be broader access to and use of sequencing and phylogenetic analysis [20]. Such capacity could be expanded to include surveillance of S. Typhi and other priority pathogens if this is not already being done, and S. Typhi could be incorporated into genomic initiatives conducted at the local, national, and regional levels. Generating additional, more geographically representative S. Typhi WGS data and standardized metadata and sharing these data more broadly can facilitate a better understanding of the global distribution of drug-resistant S. Typhi, as well as how and where AMR emerges and spreads, thereby providing additional incentives for timely, open data sharing. By sharing information about platforms and programs that have been established to enable the generation, analysis, and visualization of S. Typhi genomic data, both on the African continent and more broadly, we hope to encourage further uptake of these freely available resources, which may lead to additional, sustainable generation of S. Typhi genomic data to inform decision making. In addition, we hope that governments, donors, and other stakeholders continue to support the establishment and sustenance of molecular surveillance capacity by providing funding for procurement of equipment and reagents, training for bioinformaticians, establishment of quality-assured laboratories, and sample management.